Final Presentation

Group 3

December 14, 2017

Group 3 Objective: Association between dependent and independent variables

The objectives were to create summaries and visualizations of how the dependent variable is associated with the different independent variables. Our goal was to develop different models to analyze these associations in the data. Our work included:

Rationale:

Idea development:

Functions, Get Ready!!!:

Image retrieved from: https://www.pinterest.com/pin/233624299389735946/

Function 1: Fitting Linear Regression Model

Linear Regression Function Code

linear_model <- function(peak_trough, dep_var, 
                      data = efficacy_summary) {
  function_data <- data %>% 
    filter(level == peak_trough) %>% 
    gather(key = independent_var, value = indep_measure, 
           -drug, -dosage, -dose_int, -level, -ELU, -ESP, 
           na.rm = TRUE) %>% 
    select(drug, dosage, dose_int, level, dep_var, 
           indep_measure, independent_var) 
  
if(dep_var=="ELU") 
  {function_data$vect <- function_data$ELU}
if(dep_var=="ESP") 
  {function_data$vect <- function_data$ESP}

  model_function <- function(data) {
    model_results <- lm(vect ~ scale(indep_measure), 
                        data = data)      
    }

Function Code, Continued

  estimate_results <- function_data %>% 
    group_by(independent_var, dose_int) %>% 
    nest() %>% 
    mutate(mod_results = purrr::map(data, 
                            model_function)) %>% 
    mutate(mod_coefs = purrr::map(mod_results, 
                            broom::tidy)) %>% 
    select(independent_var, dose_int, mod_results, 
                            mod_coefs) %>% 
    unnest(mod_coefs) %>% 
    filter(term == "scale(indep_measure)")

Linear Model Function Code, Continued

  coef_plot <- estimate_results %>%
    mutate(independent_var = forcats::fct_reorder(
        independent_var, estimate, fun = max)) %>%
    rename(Dose_Interval = dose_int) %>% 
    ggplot(aes(x = estimate, y = independent_var, 
              color = Dose_Interval)) +
    geom_point(aes(size = 1 / std.error)) +
    scale_size_continuous(guide = FALSE) +
    theme_few() + 
    ggtitle(label = "Linear model coefficients as function 
            of independent variables, \n by drug dose and 
            model uncertainty", subtitle = "Smaller points 
            have more uncertainty than larger points") +
    geom_vline(xintercept = 0, color = "cornflower blue") 
  
  coef_plot
}

Linear Model Function, Input Parameters:

Linear Model- Visualize independent variable coefficients - ELU

#Sample code for function, linear_model (Cmax and ELU)
linear_model(peak_trough = "Cmax", dep_var = "ELU")

Linear Model- Visualize independent variable coefficients - ESP

#Sample code for function, linear_model (Cmax and ESP)
linear_model(peak_trough = "Cmax", dep_var = "ESP")

Linear Model Interpretation

Regression Tree Function

Regression Tree Function code

rpart(ELU ~  drug + dosage + level + 
      plasma + `Uninvolved lung` + `Rim (of Lesion)` + 
      `Outer Caseum` + `Inner Caseum` + 
        `Standard Lung` + `Standard Lesion` + cLogP + 
        `Human Plasma Binding` + 
        `Mouse Plasma Binding` + `MIC Erdman Strain` + 
      `MIC Erdman Strain with Serum` + 
        `MIC rv strain` + `Caseum binding` + 
        `Macrophage Uptake (Ratio)`,
      data = function_data, 
      control = rpart.control(cp = -1, 
                              minsplit = min_split, 
                             minbucket = min_bucket))

Regression Tree Function input parameters

regression_tree(dep_var = "ELU", min_split = 8, 
                min_bucket = 4)

Regression Tree Function example (ELU)

Regression Tree Function interpretation

Regression Tree Function example (ESP)

LASSO Function

Least Absolute Shrinkage Selector Operator

Background We want to predict our outcome using the varibles we have in front of us; it is the next generation of step-wise regression anf can handle more varaibles than samples.

Example

LASSO Function

Least Absolute Shrinkage Selector Operator

LASSO Function part 1 preparing the data

LASSO_model <- function(dep_var, dose, df = efficacy_summary) {
  data <- na.omit(df) %>% 
  select_if(is.numeric) %>%
  filter(dosage == dose)

response <- df %>% 
  select(dep_var)

predictors <- df %>%
  select(c("PLA", "ULU", "RIM", "OCS", "ICS", "SLU", "SLE", "cLogP",
           "huPPB", "muPPB", "MIC_Erdman", 'MICserumErd',
           "MIC_Rv", "Caseum_binding", "MacUptake"))

y <- as.numeric(unlist(response))
x <- as.matrix(predictors)

LASSO Function part 2, glmnet

fit = glmnet(x, y)

coeff <- coef(fit,s=0.1)
coeff <- as.data.frame(as.matrix(coeff))
}

Testing LASSO function:

LASSO_model(dep_var = "ELU", dose = 50)
predictor coeff
(Intercept) 1.2911027
cLogP 0.2908215
muPPB 0.0049209

RandomForest Function

RandomForest Input

Random Forest Code Example

efficacy.rf <- randomForest( ELU~ ., data =dataset,
              na.action = na.roughfix,
                        ntree= 500, 
                        importance = TRUE)

Random Forest example with ELU

best_variables("ELU", drug = FALSE)

Random Forest example with ESP

best_variables("ESP", drug = TRUE)

Interpretation

Room for errors:

These functions may be prone to several errors if:

Room for errors (continued):

Next steps: